table(mtcars$cyl)
4 6 8
11 7 14
Presentar operadores lógicos y trabajar con ellos
Acceder a elementos de vectores y DF’s según condiciones lógicas
Inspeccionar valores, rangos y estructura de una DF
Conteo de frecuencia de un dato categórico
table(mtcars$cyl)
4 6 8
11 7 14
Múltiples categorías
cyl |
Number of cylinders |
vs |
Engine (0 = V-shaped, 1 = straight) |
table(mtcars$cyl, mtcars$vs)
0 1
4 1 10
6 3 4
8 14 0
Representar los valores mínimo y máximo, primer y tercer cuartil, media, promedio de un vector o data frame.
summary(mtcars$mpg) Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90
La salida de función summary cambia según el objeto que estemos trabajando
summary(mtcars) mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
Vector seleccionado “millas por galón” mpg
mtcars$mpg [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
Mayor que un valor dado
mtcars$mpg>19.20 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
[25] FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
Mayor que el valor que retorna una función
mtcars$mpg>mean(mtcars$mpg) [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
[25] FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
variable con promedio
creación DF
evaluar valores que cumplen ambas condiciones
promedio_mpg <- mean(mtcars$mpg)
valor_3ercuartil <- 33.90
df_mpg <- data.frame(mpg= mtcars$mpg,
may_mean= mtcars$mpg>promedio_mpg,
men_3cuar= mtcars$mpg<valor_3ercuartil )df_mpg mpg may_mean men_3cuar
1 21.0 TRUE TRUE
2 21.0 TRUE TRUE
3 22.8 TRUE TRUE
4 21.4 TRUE TRUE
5 18.7 FALSE TRUE
6 18.1 FALSE TRUE
7 14.3 FALSE TRUE
8 24.4 TRUE TRUE
9 22.8 TRUE TRUE
10 19.2 FALSE TRUE
11 17.8 FALSE TRUE
12 16.4 FALSE TRUE
13 17.3 FALSE TRUE
14 15.2 FALSE TRUE
15 10.4 FALSE TRUE
16 10.4 FALSE TRUE
17 14.7 FALSE TRUE
18 32.4 TRUE TRUE
19 30.4 TRUE TRUE
20 33.9 TRUE FALSE
21 21.5 TRUE TRUE
22 15.5 FALSE TRUE
23 15.2 FALSE TRUE
24 13.3 FALSE TRUE
25 19.2 FALSE TRUE
26 27.3 TRUE TRUE
27 26.0 TRUE TRUE
28 30.4 TRUE TRUE
29 15.8 FALSE TRUE
30 19.7 FALSE TRUE
31 15.0 FALSE TRUE
32 21.4 TRUE TRUE
Se aplica el condicional lógico sobre el elemento i del vector analizado teniendo de resultado un vector del mismo length del vector de entrada.
Por ejemplo, si i vale 3, se compara si 22.8 es mayor que el promedio_mpg y si es menor que valor_3ercuartil
mtcars$mpg>promedio_mpg & mtcars$mpg<valor_3ercuartil [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
[25] FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
Operador & (y)
mtcars$mpg[mtcars$mpg>promedio_mpg & mtcars$mpg<valor_3ercuartil ] [1] 21.0 21.0 22.8 21.4 24.4 22.8 32.4 30.4 21.5 27.3 26.0 30.4 21.4
Operador == doble igualdad (👀 es distinto a asignación)
which (mtcars$mpg == 19.2 )[1] 10 25
which indica el índice de los elementos extraídos según una condición
mtcars[ mtcars$mpg==21, ] mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Operador != diferente
mtcars[mtcars$mpg!=21,c(1,4:6)] mpg hp drat wt
Datsun 710 22.8 93 3.85 2.320
Hornet 4 Drive 21.4 110 3.08 3.215
Hornet Sportabout 18.7 175 3.15 3.440
Valiant 18.1 105 2.76 3.460
Duster 360 14.3 245 3.21 3.570
Merc 240D 24.4 62 3.69 3.190
Merc 230 22.8 95 3.92 3.150
Merc 280 19.2 123 3.92 3.440
Merc 280C 17.8 123 3.92 3.440
Merc 450SE 16.4 180 3.07 4.070
Merc 450SL 17.3 180 3.07 3.730
Merc 450SLC 15.2 180 3.07 3.780
Cadillac Fleetwood 10.4 205 2.93 5.250
Lincoln Continental 10.4 215 3.00 5.424
Chrysler Imperial 14.7 230 3.23 5.345
Fiat 128 32.4 66 4.08 2.200
Honda Civic 30.4 52 4.93 1.615
Toyota Corolla 33.9 65 4.22 1.835
Toyota Corona 21.5 97 3.70 2.465
Dodge Challenger 15.5 150 2.76 3.520
AMC Javelin 15.2 150 3.15 3.435
Camaro Z28 13.3 245 3.73 3.840
Pontiac Firebird 19.2 175 3.08 3.845
Fiat X1-9 27.3 66 4.08 1.935
Porsche 914-2 26.0 91 4.43 2.140
Lotus Europa 30.4 113 3.77 1.513
Ford Pantera L 15.8 264 4.22 3.170
Ferrari Dino 19.7 175 3.62 2.770
Maserati Bora 15.0 335 3.54 3.570
Volvo 142E 21.4 109 4.11 2.780
Operador | or (o)
mtcars[which(mtcars$mpg==21 | mtcars$mpg==22.8),1:3] mpg cyl disp
Mazda RX4 21.0 6 160.0
Mazda RX4 Wag 21.0 6 160.0
Datsun 710 22.8 4 108.0
Merc 230 22.8 4 140.8
# View(datasets::LifeCycleSavings)Sobre el contenido de la df: según la hipótesis del ahorro a lo largo del ciclo vital desarrollada por Franco Modigliani, el coeficiente de ahorro (ahorro personal agregado dividido por la renta disponible) se explica por la renta disponible per cápita, la tasa porcentual de variación de la renta disponible per cápita y dos variables demográficas: el porcentaje de población menor de 15 años y el porcentaje de población mayor de 75 años. Los datos se promedian a lo largo de la década 1960-1970 para eliminar el ciclo económico u otras fluctuaciones a corto plazo
df_ahorro <- datasets::LifeCycleSavings
head(df_ahorro, 8) sr pop15 pop75 dpi ddpi
Australia 11.43 29.35 2.87 2329.68 2.87
Austria 12.07 23.32 4.41 1507.99 3.93
Belgium 13.17 23.80 4.43 2108.47 3.82
Bolivia 5.75 41.89 1.67 189.13 0.22
Brazil 12.88 42.19 0.83 728.47 4.56
Canada 8.79 31.72 2.85 2982.88 2.43
Chile 0.60 39.74 1.34 662.86 2.67
China 11.90 44.75 0.67 289.52 6.51
Dimensiones
dim(df_ahorro)[1] 50 5
Número de columnas
ncol(df_ahorro)[1] 5
Número de filas
nrow(df_ahorro)[1] 50
Nombres de las columnas
colnames(df_ahorro)[1] "sr" "pop15" "pop75" "dpi" "ddpi"
Nombres de las filas
rownames(df_ahorro) [1] "Australia" "Austria" "Belgium" "Bolivia"
[5] "Brazil" "Canada" "Chile" "China"
[9] "Colombia" "Costa Rica" "Denmark" "Ecuador"
[13] "Finland" "France" "Germany" "Greece"
[17] "Guatamala" "Honduras" "Iceland" "India"
[21] "Ireland" "Italy" "Japan" "Korea"
[25] "Luxembourg" "Malta" "Norway" "Netherlands"
[29] "New Zealand" "Nicaragua" "Panama" "Paraguay"
[33] "Peru" "Philippines" "Portugal" "South Africa"
[37] "South Rhodesia" "Spain" "Sweden" "Switzerland"
[41] "Turkey" "Tunisia" "United Kingdom" "United States"
[45] "Venezuela" "Zambia" "Jamaica" "Uruguay"
[49] "Libya" "Malaysia"
Estructura de un objeto
str(df_ahorro)'data.frame': 50 obs. of 5 variables:
$ sr : num 11.43 12.07 13.17 5.75 12.88 ...
$ pop15: num 29.4 23.3 23.8 41.9 42.2 ...
$ pop75: num 2.87 4.41 4.43 1.67 0.83 2.85 1.34 0.67 1.06 1.14 ...
$ dpi : num 2330 1508 2108 189 728 ...
$ ddpi : num 2.87 3.93 3.82 0.22 4.56 2.43 2.67 6.51 3.08 2.8 ...
Resumen Estadístico
summary(df_ahorro) sr pop15 pop75 dpi
Min. : 0.600 Min. :21.44 Min. :0.560 Min. : 88.94
1st Qu.: 6.970 1st Qu.:26.21 1st Qu.:1.125 1st Qu.: 288.21
Median :10.510 Median :32.58 Median :2.175 Median : 695.66
Mean : 9.671 Mean :35.09 Mean :2.293 Mean :1106.76
3rd Qu.:12.617 3rd Qu.:44.06 3rd Qu.:3.325 3rd Qu.:1795.62
Max. :21.100 Max. :47.64 Max. :4.700 Max. :4001.89
ddpi
Min. : 0.220
1st Qu.: 2.002
Median : 3.000
Mean : 3.758
3rd Qu.: 4.478
Max. :16.710
Revisar nombres filas
rownames(df_ahorro)[1:10] [1] "Australia" "Austria" "Belgium" "Bolivia" "Brazil"
[6] "Canada" "Chile" "China" "Colombia" "Costa Rica"
Asignar nueva columa a la DF
df_ahorro$pais <- rownames(df_ahorro)Con los valores correspondientes a pop15 y pop75 obtener promedio por cada observación
df_ahorro$edad_promedio <- mean(df_ahorro$pop15, df_ahorro$pop75)
Revisar DF
head(df_ahorro, 5)
Mínimo en vector
min(df_ahorro$pop15)[1] 21.44
Máximo en vector
max(df_ahorro$pop15)[1] 47.64
df_gapminder <- read.csv('https://raw.githubusercontent.com/javendaXgh/datos/refs/heads/master/gapminder.csv') head(df_gapminder,3) X country continent year lifeExp pop gdpPercap
1 1 Afghanistan Asia 1952 28.801 8425333 779.4453
2 2 Afghanistan Asia 1957 30.332 9240934 820.8530
3 3 Afghanistan Asia 1962 31.997 10267083 853.1007
👀: los datos tienen estructura tabular. Observaciones son filas y atributos las columnas
Estructura DF
str(df_gapminder)'data.frame': 1704 obs. of 7 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
$ continent: chr "Asia" "Asia" "Asia" "Asia" ...
$ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
$ lifeExp : num 28.8 30.3 32 34 36.1 ...
$ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
$ gdpPercap: num 779 821 853 836 740 ...
summary(df_gapminder) X country continent year
Min. : 1.0 Length:1704 Length:1704 Min. :1952
1st Qu.: 426.8 Class :character Class :character 1st Qu.:1966
Median : 852.5 Mode :character Mode :character Median :1980
Mean : 852.5 Mean :1980
3rd Qu.:1278.2 3rd Qu.:1993
Max. :1704.0 Max. :2007
lifeExp pop gdpPercap
Min. :23.60 Min. :6.001e+04 Min. : 241.2
1st Qu.:48.20 1st Qu.:2.794e+06 1st Qu.: 1202.1
Median :60.71 Median :7.024e+06 Median : 3531.8
Mean :59.47 Mean :2.960e+07 Mean : 7215.3
3rd Qu.:70.85 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
Max. :82.60 Max. :1.319e+09 Max. :113523.1
Tiene estructura tabular
Países
unique(df_gapminder$country) [1] "Afghanistan" "Albania"
[3] "Algeria" "Angola"
[5] "Argentina" "Australia"
[7] "Austria" "Bahrain"
[9] "Bangladesh" "Belgium"
[11] "Benin" "Bolivia"
[13] "Bosnia and Herzegovina" "Botswana"
[15] "Brazil" "Bulgaria"
[17] "Burkina Faso" "Burundi"
[19] "Cambodia" "Cameroon"
[21] "Canada" "Central African Republic"
[23] "Chad" "Chile"
[25] "China" "Colombia"
[27] "Comoros" "Congo, Dem. Rep."
[29] "Congo, Rep." "Costa Rica"
[31] "Cote d'Ivoire" "Croatia"
[33] "Cuba" "Czech Republic"
[35] "Denmark" "Djibouti"
[37] "Dominican Republic" "Ecuador"
[39] "Egypt" "El Salvador"
[41] "Equatorial Guinea" "Eritrea"
[43] "Ethiopia" "Finland"
[45] "France" "Gabon"
[47] "Gambia" "Germany"
[49] "Ghana" "Greece"
[51] "Guatemala" "Guinea"
[53] "Guinea-Bissau" "Haiti"
[55] "Honduras" "Hong Kong, China"
[57] "Hungary" "Iceland"
[59] "India" "Indonesia"
[61] "Iran" "Iraq"
[63] "Ireland" "Israel"
[65] "Italy" "Jamaica"
[67] "Japan" "Jordan"
[69] "Kenya" "Korea, Dem. Rep."
[71] "Korea, Rep." "Kuwait"
[73] "Lebanon" "Lesotho"
[75] "Liberia" "Libya"
[77] "Madagascar" "Malawi"
[79] "Malaysia" "Mali"
[81] "Mauritania" "Mauritius"
[83] "Mexico" "Mongolia"
[85] "Montenegro" "Morocco"
[87] "Mozambique" "Myanmar"
[89] "Namibia" "Nepal"
[91] "Netherlands" "New Zealand"
[93] "Nicaragua" "Niger"
[95] "Nigeria" "Norway"
[97] "Oman" "Pakistan"
[99] "Panama" "Paraguay"
[101] "Peru" "Philippines"
[103] "Poland" "Portugal"
[105] "Puerto Rico" "Reunion"
[107] "Romania" "Rwanda"
[109] "Sao Tome and Principe" "Saudi Arabia"
[111] "Senegal" "Serbia"
[113] "Sierra Leone" "Singapore"
[115] "Slovak Republic" "Slovenia"
[117] "Somalia" "South Africa"
[119] "Spain" "Sri Lanka"
[121] "Sudan" "Swaziland"
[123] "Sweden" "Switzerland"
[125] "Syria" "Taiwan"
[127] "Tanzania" "Thailand"
[129] "Togo" "Trinidad and Tobago"
[131] "Tunisia" "Turkey"
[133] "Uganda" "United Kingdom"
[135] "United States" "Uruguay"
[137] "Venezuela" "Vietnam"
[139] "West Bank and Gaza" "Yemen, Rep."
[141] "Zambia" "Zimbabwe"
Años
unique(df_gapminder$year) [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Función filter.
df_venezuela <- df_gapminder %>%
filter(country=='Venezuela')
df_venezuela X country continent year lifeExp pop gdpPercap
1 1633 Venezuela Americas 1952 55.088 5439568 7689.800
2 1634 Venezuela Americas 1957 57.907 6702668 9802.467
3 1635 Venezuela Americas 1962 60.770 8143375 8422.974
4 1636 Venezuela Americas 1967 63.479 9709552 9541.474
5 1637 Venezuela Americas 1972 65.712 11515649 10505.260
6 1638 Venezuela Americas 1977 67.456 13503563 13143.951
7 1639 Venezuela Americas 1982 68.557 15620766 11152.410
8 1640 Venezuela Americas 1987 70.190 17910182 9883.585
9 1641 Venezuela Americas 1992 71.150 20265563 10733.926
10 1642 Venezuela Americas 1997 72.146 22374398 10165.495
11 1643 Venezuela Americas 2002 72.766 24287670 8605.048
12 1644 Venezuela Americas 2007 73.747 26084662 11415.806
Tabla sumario
summary(df_venezuela) X country continent year
Min. :1633 Length:12 Length:12 Min. :1952
1st Qu.:1636 Class :character Class :character 1st Qu.:1966
Median :1638 Mode :character Mode :character Median :1980
Mean :1638 Mean :1980
3rd Qu.:1641 3rd Qu.:1993
Max. :1644 Max. :2007
lifeExp pop gdpPercap
Min. :55.09 Min. : 5439568 Min. : 7690
1st Qu.:62.80 1st Qu.: 9318008 1st Qu.: 9307
Median :68.01 Median :14562164 Median :10025
Mean :66.58 Mean :15129801 Mean :10089
3rd Qu.:71.40 3rd Qu.:20792772 3rd Qu.:10839
Max. :73.75 Max. :26084662 Max. :13144
df_colombia <- df_gapminder %>%
filter(country=='Colombia')
df_colombia X country continent year lifeExp pop gdpPercap
1 301 Colombia Americas 1952 50.643 12350771 2144.115
2 302 Colombia Americas 1957 55.118 14485993 2323.806
3 303 Colombia Americas 1962 57.863 17009885 2492.351
4 304 Colombia Americas 1967 59.963 19764027 2678.730
5 305 Colombia Americas 1972 61.623 22542890 3264.660
6 306 Colombia Americas 1977 63.837 25094412 3815.808
7 307 Colombia Americas 1982 66.653 27764644 4397.576
8 308 Colombia Americas 1987 67.768 30964245 4903.219
9 309 Colombia Americas 1992 68.421 34202721 5444.649
10 310 Colombia Americas 1997 70.313 37657830 6117.362
11 311 Colombia Americas 2002 71.682 41008227 5755.260
12 312 Colombia Americas 2007 72.889 44227550 7006.580
Función bind_rows
df_gran_colombia <- bind_rows(df_venezuela,
df_colombia )df_gran_colombia X country continent year lifeExp pop gdpPercap
1 1633 Venezuela Americas 1952 55.088 5439568 7689.800
2 1634 Venezuela Americas 1957 57.907 6702668 9802.467
3 1635 Venezuela Americas 1962 60.770 8143375 8422.974
4 1636 Venezuela Americas 1967 63.479 9709552 9541.474
5 1637 Venezuela Americas 1972 65.712 11515649 10505.260
6 1638 Venezuela Americas 1977 67.456 13503563 13143.951
7 1639 Venezuela Americas 1982 68.557 15620766 11152.410
8 1640 Venezuela Americas 1987 70.190 17910182 9883.585
9 1641 Venezuela Americas 1992 71.150 20265563 10733.926
10 1642 Venezuela Americas 1997 72.146 22374398 10165.495
11 1643 Venezuela Americas 2002 72.766 24287670 8605.048
12 1644 Venezuela Americas 2007 73.747 26084662 11415.806
13 301 Colombia Americas 1952 50.643 12350771 2144.115
14 302 Colombia Americas 1957 55.118 14485993 2323.806
15 303 Colombia Americas 1962 57.863 17009885 2492.351
16 304 Colombia Americas 1967 59.963 19764027 2678.730
17 305 Colombia Americas 1972 61.623 22542890 3264.660
18 306 Colombia Americas 1977 63.837 25094412 3815.808
19 307 Colombia Americas 1982 66.653 27764644 4397.576
20 308 Colombia Americas 1987 67.768 30964245 4903.219
21 309 Colombia Americas 1992 68.421 34202721 5444.649
22 310 Colombia Americas 1997 70.313 37657830 6117.362
23 311 Colombia Americas 2002 71.682 41008227 5755.260
24 312 Colombia Americas 2007 72.889 44227550 7006.580